NSF PAR Search | NSF Public Access Repository

Effectively Learning Initiation Sets in Hierarchical Reinforcement Learning

Bagaria, Akhil; Abbatematteo, Ben; Gottesman, Omer; Corsaro, Matt; Rammohan, Sreehari; Konidaris, George (December 2023, 37th Conference on Neural Information Processing Systems)

An agent learning an option in hierarchical reinforcement learning must solve three problems: identify the option’s subgoal (termination condition), learn a policy, and learn where that policy will succeed (initiation set). The termination condition is typically identified first, but the option policy and initiation set must be learned simultaneously, which is challenging because the initiation set depends on the option policy, which changes as the agent learns. Consequently, data obtained from option execution becomes invalid over time, leading to an inaccurate initiation set that subsequently harms downstream task performance. We highlight three issues—data non-stationarity, temporal credit assignment, and pessimism—specific to learning initiation sets, and propose to address them using tools from off-policy value estimation and classification. We show that our method learns higher-quality initiation sets faster than existing methods (in MINIGRID and MONTEZUMA’S REVENGE), can automatically discover promising grasps for robot manipulation (in ROBOSUITE), and improves the performance of a state-of-the-art option discovery method in a challenging maze navigation task in MuJoCo.

Full Text Available

We propose a new method for count-based exploration in high-dimensional state spaces. Unlike previous work which relies on density models, we show that counts can be derived by averaging samples from the Rademacher distribution (or coin flips). This insight is used to set up a simple supervised learning objective which, when optimized, yields a state’s visitation count. We show that our method is significantly more effective at deducing ground-truth visitation counts than previous work; when used as an exploration bonus for a model-free reinforcement learning algorithm, it outperforms existing approaches on most of 9 challenging exploration tasks, including the Atari game MONTEZUMA’S REVENGE.

Search for: All records